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While its theories, philosophies, research designs, and statistical 
methods are of great importance to any science, perhaps most fundamental 
to a science are its measuring instruments. -These instruments operational- 
ize the epistemic relationship between the constructs of a science and 
reality. Without valid measuring devices a science will remain forever 

0 

in the dark ages. Thus, it is particularly disturbing to find an invalid 
measuring instnWht in the "fotm of an attitude scale beginning to appear 
in the experimental literature in communication. a 

Several recent award winning papers at SCA and ICA conventions 1 and 
at least two articles recently published have used an attitude scale often 
referred to as a "known interval" scale. ^ While this scale was assuredly 
inspired by laudable motives, it can produce spuriously inflated correlation 
coefficients, spuriously high reliability and validity estimates, and spurious 
significance on statistical tests. It is the purpose of this paper to 
demonstrate these results, and to argue that the scale should not be used in 
future studies. 

The reasoning will take two general lines. First, tho effects of th& * 
scale on reliability, validity, and significance testing will be demonstrated. 
Then the reasoning behind the sqale and the method of obtaining its values 
will be discussed. 



Figure 1 about here 



The "known interval" scale, referred to hereafter as the 7.8 scale 9 * 
is reproduced in Figure 1. The parentheses around the numbers indicate that 
subjects responding to the scale are presented with only the blanks and the 
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anchoring teirns, not with the numerals. The scale was developed because 
"[it] recognizes, the non-equal nature of peopled perceptions of attitude 
scales and is appropriate for use with interval statistics."^ I assune 
that the term "interval statistics 11 refers to "parametric tests. Rather 
than arguing this point in detail, I simply note that while interval scales 
are helpful if they truly are interval scales, they are not a prerequisite 
to making a statistical inference based on a parametric test . 5 But ignoring 
the rationale behind the -scale for the moment, let us consider th^effects 
of employing the scale in communication research. * 

CORRELATION AND TESTS OF SIGNIFICANCE 
One of the principal defenses of the 7.8 scale is Jhat it correlates 
highly with semantic differential scales, higher than does a single seven 
point scale. * -% 

One item scales have been called notoriously unreliable by many. 
However, the carefully constructed known- interval scale used in this 
study had an extre^.ly high correlation with the semantic dif- 
ferential scales which indicates it is not unreliable. Second, 
it had predictive validity and produced the same findings as the 
semantic differential- type items. It has two significant ad- 
vantages: (1) it is much easier to administer than the semantic 
differential type items, and (2) it yields a lower within error 
-estimate which reduces the likelihood of obscuring significant 
results tfhen in fact they do exist. 6 
The* reliability and predictive validity claimed for the 7.8 scale 
are due to two sources: (a) the reliability and predictive validity of a 
"regular" seven point scale (however large that is) and (b) the spurious 



increase in these components produced by counting a one unit shift as 
though the subject had moved 1.8 units. To understand why this is so, * 
consider the effect of extre; scores on the value of a Pearson r. Figure 

2 partially illustrates this effect. The scores of the first six subjects 

i-^-- « -« — — — - — — - 

Figure 2 about here 

are negatively correlated, r^--.26. Khen the scor'es of the seventh subject 
are included, the correlation increases to +.29. These values are obtained 
using the ordinary seven point scale. 'Suppose that in place of 7 we use 7.8. 
The correlation is increased to +.42. If the 7 is replaced by 10, r^g=+.66. 
By raising the value of this single ex-tre^* score, it is possible to in- 
crease r as high as one desires. Thus, r 20 =+ »93 r 10CT + *^^* Cl ian Si n g 
the values of the first six scores from regular to 7.8 foim has a negligible 
effect on r. Notice that the first six number pairs remain negatively cor- , 
related despite changes in the overall r due to the value assigned to the 
extre • \ score pair. Notice also that the changes in r are in no way related 
to any real behavioral event. Subject nunber seven checked the extre/ score 
for administration Xi and again for administration He did this only 
once. The observed changes in r are due solely to the value assigned to the 
extreme score after the data have been collected. Had subject number seven 
checked the seventh blank on X^ and the first blank on X2, repeating the 
above procedure of increasing the scale value of the seventh blank would 
produce an increasing correlation in the negative direction. 

If we interpret the correlations of Figure 2 as test-retest reliability, 
then it is obvious that we can increase r tt as high as we wish by changing 
the numerical value assigned to the ektre/>\2 score. It is also obvious that 
this increase in r tt has nothing to do with increased reliability. The event 



of the experimenter assigning a large number to a scale value is independent 
of the event of a subject consistently checking the same scale position on 
repeated measures . ' • 

Having demonstrated the effect of changes in -extre.w i score values 
on r interpreted as reliability, its effect on validity has also been 
denionst rated. If each of the Xo values in Figure 2 is multiplied by four, 
X2 becomes a scale running from 4 to 28*. This is the range of the four 
smnmated semantic differential type scales used as a criterion. In the 
above quotation, the correlation between the seven and the 28 point scales 
is referred to as reliability. I choose to call it validity. In either case, 
multiplying the scores by a constant has no. effect on r. Thus the arguments 
in the preceding paragraphs apply directly tc r^ 7 )^8) an< ^ m ^ increase in 
r (7. 8) (28^ over r (7 0)(28) is the s P urious -T esult of changing the scale 
"yalue. It is unreal ted to the reliability of validity of the actual data. 
Skeptics may work these calculations out for themselves. 

Predictive validity is claimed for the scale since it "produced the 
same findings" as did the semantic differential items. This claim is true 
only to the extent that any single seven interval scaling device measures 
what four summated scales in semantic differential foxm measure. . To this 

extent, assigning the numbers "1" to "7" to the data, as in, a regular seven 

. o 

interval scale, will have predictive validity. Any predictive validity the 
7.8 scale has derives directly from the predictive validity of the ^erferal 
seven po 4 nt case. But the predictive correspondence of the 7.8 scnle to / 
the semantic differential scale will often be less than for the regular 
seven point scale, since the 7.8 scale can produce spurious significance on 
statistical tests when such significance does not in fact exist. (Insider the 
data of Figure 3. 
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Figure 3 abouL here 



Here Yi ancl Y-> may be interpreted as pretest and posttest scores 
, * •* 

respectively, *of six subjects on a seven point scale* The t value for 
this data is 2.318 with 5 degrees of freedom which is not significant at 
the .05 level, two-^t ailed. If these same data are rescored as advocated 
by the 7.8 scoJe, they become significant (t=2 r 95, df=5, p<£ .05, two- 
tailed). This is not an^.solated instance. It can occur on any statistical 
test ft, Scheffe, Afiova, etc.) with any K sets of data, provided only that 
one set of data have relatively »few scores of "seven' 1 and that the* other 
set(s) of data has (have) many, and that the mean of the "non-seven" scores 
in one set of data is fairly close to the mean of the "ncn-seven" scores 

m 

in the other se'.(s). Such conditions are frequently met in empirical data. 
The 7.8 scale is thus capable of producing spuriously significant results* 

TIE APPLICATION OF JONES AND THURSTONE TO 

ATTITUDE MEASUREMENT 

Given the effects of the 7.8 scale as demonstrated above, the reason- 

ing behind this scale can be examined. Two questions are asked in this 

7 

section. First, are the findings of Jones and Thurstone applicable to 
attitude measurement? Second, were Jones and Thurstone f s findings applied 
correctly in the case of the 7. 8, scale? 
Is Jones and Thurstone Applicable in General? 

To answer the first question, consider what Jones and Thurstone did. 
During the early 1950' s they administered questionnaires containing 51 
descriptive words and phrases £see Table 1) to 905 enlisted personnel at 
Fort Lee, Virginia, who were asked to give their meanings for these words 
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and phrases in terns of the amount of like or dislike the words and phrases 
indicate in preference for foods, such as creamed corn. These indicated 
' preferences were marked on a nine point Scale which was anchored at both 
ends and in the middle with the phrases "Greatest Dislike/ 1 "Neither Like 
Nor Dislike/' and "Greatest Like." Die symbols "-4," "-3," ... M +2," "+3," 
"+4" appeared *above the nine blanks on each scale. The responses were' 
scaled by the method of successive intervals which produces normal deviates 
(2 scores) for each item. These noimal deviates may then be interpreted 
as a continuum of meaning for the 51 items, as in Table 1 which is taken, 
in part, from Table 2 in Jones and Thurstone. 8 

Burgoon^ selected the seven underlined words in Table 1 as 
anchors for the 7.8 scale. The values in the scale result from adding a 
constant (4.1) to the 2 scores (scale values) obtained by Jones and Thurstone. 

There is a general problem of order effects associated with the re- 
suits of Jones and Thurstone. All 905 subjects Responded to all 51 items 
An exactly the same order. Thus the effect,? of practice and fatigue are 
inherently confounded with the rating of each item. The lack of counter 
balancing for order- could, in my opinion, result in a rejection of this 
paper should it be submitted for publication today. Thus, I believe that 
the results of Jones and Thurstone are of dubious value for attitude scaling 
purpose^ . 

Beyond this general criticism, Jones and Thurstone comment on the 
applicability of thei^r results. They state that their results f, might be 
generalized to the extent that the 'phrases useful for defining successive 
intervals on a food preference schedule might also be useful for defining 
intervals on schedules assessing preferences for other consumer goods." 10 
Note that\he application of these results to the 7.8 scale has not been in 
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the area of consumer goods . 

f - 
Was Jones and Thurstone Applied Correctly in This Case? 

Whether or not the reader accepts the .general applicability of Jones • 
and Thurstone to attitude scaling, the application in the particular case 
of the 7.8 scale is incorrect. In particular, the application of Jones and 
Thurstone to the 7.8 scale was done in reverse ■ (i.e. , backwards). Instead , 
of choosing anchors for* a regular seven, point scale based on the Jones and 
Thurstone results, the seven scale values were assigned on the basis of the , 
chosen anchors. That is, instead of selecting equally spaced words and 
assigning them as anchors to seven equally spaced 'numerals , seven unequally 
spaced words were selected and the scale values changed to conform with 
these unequal intervals. Since the fallacy in this logic may not b'e im- 
mediately apparent', let us form' another attitude scale called the TIC scale 
(Tongue in Cheek) using the same logic. For "my TIC scal.e I will choose the 
same six anchors used by the 7.8 scale for the first six scale positions 
(Terrible, Bad, etc.). But in place of "Excellent 11 for the seventh scale 
value, I choose "Best of all" for thetIC scale, resulting in a seven point 
scale from 1.0 to 10.. 25 (See Figure 4). If the 7.8 scale is good in terms 

Figure 4 about here 

of producing reliability, validity, and significance, the TIC scale is 
spectacular*. In terms of reliability and validity, its results .may be compared 
with the 7.8 scale from the calculations presented in the first section of 
this paper (r 10 =.66, r 7 8 =.42). TIC will also outj-perform the 7.8 scale in 
getting significance out of a given set of data, j 

Unfortunately, even if this application of J ; ones and Thurstone to 
the 7.8 scale and the TIC scale were legitimate, which it is not, the data would 
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still not be interpretable. - A crucial assumption of both 7.8 and TIC is 
that a- change from "Good" to "Excellent," -or to "Best of all," corresponds * 
to a subject's perception of a change* of 1.8 (7.8-6.0), or 4.25 (10.25- t 
6.0), units on his perceptual scale, compared with. approximately one unit" 
change, between each of the lower six scale values. But there is no way of 
knowing what part of the scale stimulus the subject is responding to. Is 
he/she responding to the equally spaced intervals as they appear on the 
pane, or to the anchors below the blanks? The TIC and 7.8 scales assume-' 
that people are responding ttf the anchors rather than 'to the equal spacing 
on the .scale. The scale gives subjects an ambiguous ch&ice. If they 4 really 
do perceive the di fferences.between words as they are 'scaled, then these 

Y 

distances conflict with the «eoual spaces between the words m the paper. # 4 
Which 4s the subject to choose?'' Which does each subject choose? Since th&re 
is'no-.way to know thrs, the 7.8 scale is ambiguous and necessarily .produces 
ambiguous (and thus tininterpretable) ..results. 

fc Finally.' there are a number of errors in the transformation of the 
data between Jones and Thurstone and the 7. 8- scale. In the reproduced Table 
from Jones and Thurstone 11 there are ten coDying errors. These errors are 
presented in Table 1. One of these errors occurs on the word "Poor" which 

# • „Ji 

Table 1 about here / / . 

12 

is scaled at -1.55 by Jones and Thurstone, but appear? as -1.35 in Burgoon. 
* 

The effect of t-his error and an addition error is illastrated in Table 2. 
Mien 4,1 is added to 0.02 the result should "be 4.12 which rounds tc 4.1, 

Table 2"* about here 
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not. 4.0. If 4.1 is added to -1,55, the resul-t is £.55 which rounds to" 2.6, 

not 2.9.- There is an-addition error here, as well as a copying error, since 

the copying error accounts for pnly .2 of the .5 discrepancy. The differences 

between the 'tabic values reported by Burgoon and those listed by Jones and 
* * * • «* 

Thinvcone also affect die language intensity manipulations, such as found - k 
in Burgoon and Chase/and Burgoonj-^ For example, Jone£ and Thifrs tone did 
not test the phrase flighty favorable' 1 wh;ch is employed in the language 

intensity manipulation with a value of 2.81. This scoig/tfas achieved by^, 

/ * - - 

the phrase "Highly favorable. M % a 

; • 

CONCLUSION - . % - * . 

* This 'paper has attempted to show that increases in reliability aiid • 

validity coefficients obtained with the 7.8 scale have nothing to' do with - 

* • \* 

.observed data and, thus, h^ive nothing to say about^observed data. The in- 

r 

crease only confirms that by artificially Extending the range of a scale 
it is possible to increase' a correlation coefficient. A monotonic trans- 
formation which preserves the relative intervality of the data would not 
affe^r. -It is because the Jches and Thurstone scale values, wUcthex 
transcribed correctly or incorrectly, happen , to fprm a non-monotonic trans- 
formation, with resultant increased r and increased chance for significance, 
which Hikes' the) scale appeaL<v<> on'its surface., 

Rather than trying to adjust the values on the instruments ' after 
'the horse is put of the barn (after the data have been collected) researchers 
in communication should develop more valid and reliable instruments for 
measuring important communication variables. Tl\is was assuredly the intent 
behind the formation of the 7.8 scale. But' reliability and validity to 'be* 
useful concepts must Be the reliability and validity of data gathering in- 
struments, not the reliability and validity of a. particular set c*f scale 

r - 

values. 
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Figure 1. Comparison of the usual seven point scale values 
with values proposed for the 7.8 scale. 
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Figure 2. Illustration of the effect of extreme 
on a Pearson r. , 
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Figure 3. Illustration o£ spurious significance produced 
on t-test for two related samples using the 7.8 
scale. 
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Figure 4. The TIC attitude scale. 
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Table 2 

Effect of a copying error and incorrect addition 
on two of seven values in the 7.8 scale. 

c 

Terrible Bad Poor Neutral Fair Good Exce llent 



Jcacs and Thurstone 

\^le. Values -3.09 

7.8 Scale Values 1.0 



•2.02 -1.55 0.02 0.78 1.91= 3.71 
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